Skip to content

ci/security/test: verify strategy profile sync after deploy#139

Merged
Pigbibi merged 1 commit into
mainfrom
codex/audit-fix-20260702-0213
Jul 1, 2026
Merged

ci/security/test: verify strategy profile sync after deploy#139
Pigbibi merged 1 commit into
mainfrom
codex/audit-fix-20260702-0213

Conversation

@Pigbibi

@Pigbibi Pigbibi commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Retry the post-deploy strategy profile KV sync until the deployed Worker reports the bundled profile count.
  • Verify /api/strategy-profiles returns the expected profile count after sync.
  • Add a regression assertion so the deploy workflow keeps this fail-closed verification.

Problems found

  • A post-merge smoke test showed the deploy workflow succeeded while the live strategy profile KV still served the previous 18-profile catalog. The deploy job called the Worker sync endpoint immediately after wrangler deploy, which can hit the previous Worker version during edge propagation.

Fixes applied

  • Compute the expected bundled profile count from strategy-profiles.example.json.
  • Retry /api/internal/sync-strategy-profiles until the response count matches the bundle.
  • Fail the deploy if the synced count or live /api/strategy-profiles count does not match.

Security impact

  • No secret values changed or logged.
  • Existing bearer token usage remains masked by GitHub Actions.
  • The workflow now fails closed instead of reporting a successful deploy with stale KV state.

Architecture impact

  • No runtime API or data model changes.
  • Strengthens the release gate between Worker deployment and KV-backed public config.

Tests run

  • actionlint
  • bash -n on the extracted sync step script
  • .venv/bin/python -m unittest python.tests.test_runtime_settings.RuntimeSettingsTest.test_strategy_switch_console_deploy_workflow_syncs_bundled_profiles -v
  • .venv/bin/python -m unittest discover -s python/tests -v
  • uvx ruff check python
  • uvx ruff format --check python
  • git diff --check

Failed or skipped checks with reasons

  • Full live deploy verification is expected to run after merge on the protected main workflow.

Deployment notes

  • This PR affects only .github/workflows/deploy-strategy-switch-console.yml and its workflow regression test.
  • The main-branch deploy should update KV to the 19-profile bundle and verify the public endpoint count.

Rollback plan

  • Revert this commit to return to the previous one-shot sync behavior.
  • If needed, rerun the previous successful main workflow or redeploy the last known-good commit.

Manual follow-up checklist

  • Confirm the main-branch deploy run reports Strategy profile KV sync verified with 19 profiles.
  • Confirm /api/strategy-profiles includes the current bundle after deployment.

@Pigbibi Pigbibi added codex AI Codex operations automated-audit Automated audit changes security Security-related review or hardening tests Test coverage or validation changes ci Continuous integration or deployment changes needs-review Needs human review before merge labels Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

🤖 Codex PR Review

⚠️ Review skipped: The Codex review could not be completed.

No Codex service URL or API key configured. Set CODEX_AUDIT_SERVICE_URL, ANTHROPIC_API_KEY, or OPENAI_API_KEY.

Please ensure a human reviewer checks this PR before merging.

@Pigbibi Pigbibi removed the needs-review Needs human review before merge label Jul 1, 2026
@Pigbibi Pigbibi merged commit c209379 into main Jul 1, 2026
6 checks passed
@Pigbibi Pigbibi deleted the codex/audit-fix-20260702-0213 branch July 1, 2026 18:21

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27e162e29e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +168 to +169
if [ "$actual_count" = "$expected_count" ]; then
break

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Verify profile content, not just count

When a push changes strategy-profiles.example.json without changing the number of entries, an old Worker during deployment propagation can still return the same strategy_profiles_count, so this exits the retry loop before the newly deployed Worker has written its bundled profiles. In worker.js:1092-1120 the sync writes DEFAULT_STRATEGY_PROFILES but this workflow only observes the count, and the later live check also compares only length, so the job can succeed while KV remains on stale profile metadata.

Useful? React with 👍 / 👎.

Comment on lines +194 to +196
if [ "$live_count" != "$expected_count" ]; then
echo "Live strategy profile count $live_count; expected $expected_count." >&2
exit 2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retry the live KV read before failing

When the sync write succeeds but this immediate public read is served from an edge location with a cached strategy_profiles KV value, Cloudflare KV reads can remain eventually consistent for 60s+, so failing on the first live_count mismatch can mark a valid deploy as failed. The sync POST is retried for propagation, but the live /api/strategy-profiles verification is not, which makes this new release gate flaky after real profile-count changes.

Useful? React with 👍 / 👎.

import sys

payload = json.load(open(sys.argv[1], encoding="utf-8"))
print(payload.get("strategy_profiles_count", ""))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail when the sync endpoint skips KV

If the deployed Worker lacks the STRATEGY_SWITCH_CONFIG binding, syncStrategyProfilesResponse returns HTTP 200 with ok: true, strategy_profiles_sync.skipped: true, and the bundled count (worker.js:1077-1088). Because this script extracts only strategy_profiles_count, it reports Strategy profile KV sync verified even though no KV write occurred and the public endpoint can still pass by serving the bundled fallback.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automated-audit Automated audit changes ci Continuous integration or deployment changes codex AI Codex operations security Security-related review or hardening tests Test coverage or validation changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant